AITopics | Forsyth County

Collaborating Authors

Forsyth County

Random Forests as Statistical Procedures: Design, Variance, and Dependence

arXiv.org Machine LearningMar-3-2026

We develop a finite-sample, design-based theory for random forests in which each tree is a randomized conditional predictor acting on fixed covariates and the forest is their Monte Carlo average. An exact variance identity separates Monte Carlo error from a covariance floor that persists under infinite aggregation. The floor arises through two mechanisms: observation reuse, where the same training outcomes receive weight across multiple trees, and partition alignment, where independently generated trees discover similar conditional prediction rules. We prove the floor is strictly positive under minimal conditions and show that alignment persists even when sample splitting eliminates observation overlap entirely. We introduce procedure-aligned synthetic resampling (PASR) to estimate the covariance floor, decomposing the total prediction uncertainty of a deployed forest into interpretable components. For continuous outcomes, resulting prediction intervals achieve nominal coverage with a theoretically guaranteed conservative bias direction. For classification forests, the PASR estimator is asymptotically unbiased, providing the first pointwise confidence intervals for predicted conditional probabilities from a deployed forest. Nominal coverage is maintained across a range of design configurations for both outcome types, including high-dimensional settings. The underlying theory extends to any tree-based ensemble with an exchangeable tree-generating mechanism.

artificial intelligence, machine learning, prediction, (19 more...)

arXiv.org Machine Learning

2602.13104

Country:

North America > United States > North Carolina > Forsyth County > Winston-Salem (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report > Experimental Study (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

People Who Say They're Experiencing AI Psychosis Beg the FTC for Help

WIREDOct-22-2025, 10:30:00 GMT

People Who Say They're Experiencing AI Psychosis Beg the FTC for Help The Federal Trade Commission received 200 complaints mentioning ChatGPT between November 2022 and August 2025. Several attributed delusions, paranoia, and spiritual crises to the chatbot. On March 13, a woman from Salt Lake City, Utah called the Federal Trade Commission to file a complaint against OpenAI's ChatGPT. She claimed to be acting "on behalf of her son, who was experiencing a delusional breakdown." "The consumer's son has been interacting with an AI chatbot called ChatGPT, which is advising him not to take his prescribed medication and telling him that his parents are dangerous," reads the FTC's summary of the call.

chatgpt, openai, wired, (12 more...)

WIRED

Country:

North America > United States > Utah > Salt Lake County > Salt Lake City (0.25)
North America > United States > California (0.14)
North America > United States > Virginia (0.05)
(6 more...)

Industry:

Law > Business Law (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.39)

Add feedback

On Using Large Language Models to Enhance Clinically-Driven Missing Data Recovery Algorithms in Electronic Health Records

Lotspeich, Sarah C., Collins, Abbey, Wells, Brian J., Khanna, Ashish K., Rigdon, Joseph, McGowan, Lucy D'Agostino

arXiv.org Artificial IntelligenceOct-7-2025

Objective: Electronic health records (EHR) data are prone to missingness and errors. Previously, we devised an "enriched" chart review protocol where a "roadmap" of auxiliary diagnoses (anchors) was used to recover missing values in EHR data (e.g., a diagnosis of impaired glycemic control might imply that a missing hemoglobin A1c value would be considered unhealthy). Still, chart reviews are expensive and time-intensive, which limits the number of patients whose data can be reviewed. Now, we investigate the accuracy and scalability of a roadmap-driven algorithm, based on ICD-10 codes (International Classification of Diseases, 10th revision), to mimic expert chart reviews and recover missing values. Materials and Methods: In addition to the clinicians' original roadmap from our previous work, we consider new versions that were iteratively refined using large language models (LLM) in conjunction with clinical expertise to expand the list of auxiliary diagnoses. Using chart reviews for 100 patients from the EHR at an extensive learning health system, we examine algorithm performance with different roadmaps. Using the larger study of $1000$ patients, we applied the final algorithm, which used a roadmap with clinician-approved additions from the LLM. Results: The algorithm recovered as much, if not more, missing data as the expert chart reviewers, depending on the roadmap. Discussion: Clinically-driven algorithms (enhanced by LLM) can recover missing EHR data with similar accuracy to chart reviews and can feasibly be applied to large samples. Extending them to monitor other dimensions of data quality (e.g., plausability) is a promising future direction.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2510.03844

Country:

Africa (0.93)
North America > United States > North Carolina > Forsyth County > Winston-Salem (0.14)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.93)

Industry:

Health & Medicine > Health Care Technology > Medical Record (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.68)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Automated Retinal Layer and Fluid Segmentation and Cross-sectional Analysis using Spectral Domain Optical Coherence Tomography Images for Diabetic Retinopathy

Chen, S., Ma, D., Raviselvan, M., Sundaramoorthy, S., Popuri, K., Ju, M. J., Sarunic, M. V., Ratra, D., Beg, M. F.

arXiv.org Artificial IntelligenceMar-11-2025

This study presents an AI-driven pipeline for automated retinal segmentation and thickness analysis in diabetic retinopathy (DR) using SD-OCT imaging. A deep neural network was trained to segment ten retinal layers, intra-retinal fluid, and hyperreflective foci (HRF), with performance evaluated across multiple architectures. SwinUNETR achieved the highest segmentation accuracy, while VM-Unet excelled in specific layers. Analysis revealed distinct thickness variations between NPDR and PDR, with correlations between layer thickness and visual acuity. The proposed method enhances DR assessment by reducing manual annotation effort and providing clinically relevant thickness maps for disease monitoring and treatment planning.

segmentation, thickness, vm-unet, (13 more...)

arXiv.org Artificial Intelligence

2503.01248

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
North America > United States > Wisconsin (0.04)
North America > United States > North Carolina > Forsyth County > Winston-Salem (0.04)
(6 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (1.00)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)

Add feedback

Faster and Space Efficient Indexing for Locality Sensitive Hashing

Verma, Bhisham Dev, Pratap, Rameshwar

arXiv.org Artificial IntelligenceMar-9-2025

This work suggests faster and space-efficient index construction algorithms for LSH for Euclidean distance (\textit{a.k.a.}~\ELSH) and cosine similarity (\textit{a.k.a.}~\SRP). The index construction step of these LSHs relies on grouping data points into several bins of hash tables based on their hashcode. To generate an $m$-dimensional hashcode of the $d$-dimensional data point, these LSHs first project the data point onto a $d$-dimensional random Gaussian vector and then discretise the resulting inner product. The time and space complexity of both \ELSH~and \SRP~for computing an $m$-sized hashcode of a $d$-dimensional vector is $O(md)$, which becomes impractical for large values of $m$ and $d$. To overcome this problem, we propose two alternative LSH hashcode generation algorithms both for Euclidean distance and cosine similarity, namely, \CSELSH, \HCSELSH~and \CSSRP, \HCSSRP, respectively. \CSELSH~and \CSSRP~are based on count sketch \cite{count_sketch} and \HCSELSH~and \HCSSRP~utilize higher-order count sketch \cite{shi2019higher}. These proposals significantly reduce the hashcode computation time from $O(md)$ to $O(d)$. Additionally, both \CSELSH~and \CSSRP~reduce the space complexity from $O(md)$ to $O(d)$; ~and \HCSELSH, \HCSSRP~ reduce the space complexity from $O(md)$ to $O(N \sqrt[N]{d})$ respectively, where $N\geq 1$ denotes the size of the input/reshaped tensor. Our proposals are backed by strong mathematical guarantees, and we validate their performance through simulations on various real-world datasets.

cov, equation, vector, (14 more...)

arXiv.org Artificial Intelligence

2503.06737

Country:

Asia > Afghanistan > Parwan Province > Charikar (0.04)
Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(8 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.66)

Add feedback

The study of short texts in digital politics: Document aggregation for topic modeling

Nakka, Nitheesha, Yalcin, Omer F., Desmarais, Bruce A., Rajtmajer, Sarah, Monroe, Burt

arXiv.org Artificial IntelligenceMar-6-2025

Statistical topic modeling is widely used in political science to study text. Researchers examine documents of varying lengths, from tweets to speeches. There is ongoing debate on how document length affects the interpretability of topic models. We investigate the effects of aggregating short documents into larger ones based on natural units that partition the corpus. In our study, we analyze one million tweets by U.S. state legislators from April 2016 to September 2020. We find that for documents aggregated at the account level, topics are more associated with individual states than when using individual tweets. This finding is replicated with Wikipedia pages aggregated by birth cities, showing how document definitions can impact topic modeling results.

football, legislator, university, (15 more...)

arXiv.org Artificial Intelligence

2503.05065

Country:

North America > United States > Maryland > Baltimore (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
(106 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Media > News (1.00)
Media > Music (1.00)
Media > Film (1.00)
(16 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science > Data Mining (0.93)

Add feedback

PhenoProfiler: Advancing Phenotypic Learning for Image-based Drug Discovery

Li, Bo, Zhang, Bob, Zhang, Chengyang, Zhou, Minghao, Huang, Weiliang, Wang, Shihang, Wang, Qing, Li, Mengran, Zhang, Yong, Song, Qianqian

arXiv.org Artificial IntelligenceFeb-26-2025

In the field of image-based drug discovery, capturing the phenotypic response of cells to various drug treatments and perturbations is a crucial step. However, existing methods require computationally extensive and complex multi-step procedures, which can introduce inefficiencies, limit generalizability, and increase potential errors. To address these challenges, we present PhenoProfiler, an innovative model designed to efficiently and effectively extract morphological representations, enabling the elucidation of phenotypic changes induced by treatments. PhenoProfiler is designed as an end-to-end tool that processes whole-slide multi-channel images directly into low-dimensional quantitative representations, eliminating the extensive computational steps required by existing methods. It also includes a multi-objective learning module to enhance robustness, accuracy, and generalization in morphological representation learning. PhenoProfiler is rigorously evaluated on large-scale publicly available datasets, including over 230,000 whole-slide multi-channel images in end-to-end scenarios and more than 8.42 million single-cell images in non-end-to-end settings. Across these benchmarks, PhenoProfiler consistently outperforms state-of-the-art methods by up to 20%, demonstrating substantial improvements in both accuracy and robustness. Furthermore, PhenoProfiler uses a tailored phenotype correction strategy to emphasize relative phenotypic changes under treatments, facilitating the detection of biologically meaningful signals. UMAP visualizations of treatment profiles demonstrate PhenoProfiler ability to effectively cluster treatments with similar biological annotations, thereby enhancing interpretability. These findings establish PhenoProfiler as a scalable, generalizable, and robust tool for phenotypic learning.

dataset, phenoprofiler, representation, (15 more...)

arXiv.org Artificial Intelligence

2502.19568

Country:

Asia > Macao (0.14)
Europe > Austria > Vienna (0.14)
Asia > China > Beijing > Beijing (0.04)
(5 more...)

Genre:

Research Report > Promising Solution (0.54)
Research Report > New Finding (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.45)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

xSRL: Safety-Aware Explainable Reinforcement Learning -- Safety as a Product of Explainability

Shefin, Risal Shahriar, Rahman, Md Asifur, Le, Thai, Alqahtani, Sarra

arXiv.org Artificial IntelligenceDec-26-2024

Reinforcement learning (RL) has shown great promise in simulated environments, such as games, where failures have minimal consequences. However, the deployment of RL agents in real-world systems such as autonomous vehicles, robotics, UAVs, and medical devices demands a higher level of safety and transparency, particularly when facing adversarial threats. Safe RL algorithms have been developed to address these concerns by optimizing both task performance and safety constraints. However, errors are inevitable, and when they occur, it is essential that the RL agents can also explain their actions to human operators. This makes trust in the safety mechanisms of RL systems crucial for effective deployment. Explainability plays a key role in building this trust by providing clear, actionable insights into the agent's decision-making process, ensuring that safety-critical decisions are well understood. While machine learning (ML) has seen significant advances in interpretability and visualization, explainability methods for RL remain limited. Current tools fail to address the dynamic, sequential nature of RL and its needs to balance task performance with safety constraints over time. The re-purposing of traditional ML methods, such as saliency maps, is inadequate for safety-critical RL applications where mistakes can result in severe consequences. To bridge this gap, we propose xSRL, a framework that integrates both local and global explanations to provide a comprehensive understanding of RL agents' behavior. xSRL also enables developers to identify policy vulnerabilities through adversarial attacks, offering tools to debug and patch agents without retraining. Our experiments and user studies demonstrate xSRL's effectiveness in increasing safety in RL systems, making them more reliable and trustworthy for real-world deployment. Code is available at https://github.com/risal-shefin/xSRL.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2412.19311

Country:

North America > United States > North Carolina > Forsyth County > Winston-Salem (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Michigan > Wayne County > Detroit (0.04)
(8 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.54)
Information Technology > Security & Privacy (0.49)
Government > Military (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Aligning Knowledge Concepts to Whole Slide Images for Precise Histopathology Image Analysis

Zhao, Weiqin, Guo, Ziyu, Fan, Yinshuang, Jiang, Yuming, Yeung, Maximus, Yu, Lequan

arXiv.org Artificial IntelligenceNov-27-2024

Due to the large size and lack of fine-grained annotation, Whole Slide Images (WSIs) analysis is commonly approached as a Multiple Instance Learning (MIL) problem. However, previous studies only learn from training data, posing a stark contrast to how human clinicians teach each other and reason about histopathologic entities and factors. Here we present a novel knowledge concept-based MIL framework, named ConcepPath to fill this gap. Specifically, ConcepPath utilizes GPT-4 to induce reliable diseasespecific human expert concepts from medical literature, and incorporate them with a group of purely learnable concepts to extract complementary knowledge from training data. In ConcepPath, WSIs are aligned to these linguistic knowledge concepts by utilizing pathology vision-language model as the basic building component. In the application of lung cancer subtyping, breast cancer HER2 scoring, and gastric cancer immunotherapy-sensitive subtyping task, ConcepPath significantly outperformed previous SOTA methods which lack the guidance of human expert knowledge.

conceppath, expert concept, knowledge, (15 more...)

arXiv.org Artificial Intelligence

2411.18101

Country:

Asia > China > Hong Kong (0.05)
North America > United States > North Carolina > Forsyth County > Winston-Salem (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Oncology > Lung Cancer (0.66)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.92)

Add feedback

Assessing Concordance between RNA-Seq and NanoString Technologies in Ebola-Infected Nonhuman Primates Using Machine Learning

Rezapour, Mostafa, Narayanan, Aarthi, Mowery, Wyatt H., Gurcan, Metin Nafi

arXiv.org Artificial IntelligenceOct-30-2024

This study evaluates the concordance between RNA sequencing (RNA-Seq) and NanoString technologies for gene expression analysis in non-human primates (NHPs) infected with Ebola virus (EBOV). We performed a detailed comparison of both platforms, demonstrating a strong correlation between them, with Spearman coefficients for 56 out of 62 samples ranging from 0.78 to 0.88, with a mean of 0.83 and a median of 0.85. Bland-Altman analysis further confirmed high consistency, with most measurements falling within 95% confidence limits. A machine learning approach, using the Supervised Magnitude-Altitude Scoring (SMAS) method trained on NanoString data, identified OAS1 as a key marker for distinguishing RT-qPCR positive from negative samples. Remarkably, when applied to RNA-Seq data, OAS1 also achieved 100% accuracy in differentiating infected from uninfected samples using logistic regression, demonstrating its robustness across platforms. Further differential expression analysis identified 12 common genes including ISG15, OAS1, IFI44, IFI27, IFIT2, IFIT3, IFI44L, MX1, MX2, OAS2, RSAD2, and OASL which demonstrated the highest levels of statistical significance and biological relevance across both platforms. Gene Ontology (GO) analysis confirmed that these genes are directly involved in key immune and viral infection pathways, reinforcing their importance in EBOV infection. In addition, RNA-Seq uniquely identified genes such as CASP5, USP18, and DDX60, which play key roles in immune regulation and antiviral defense. This finding highlights the broader detection capabilities of RNA-Seq and underscores the complementary strengths of both platforms in providing a comprehensive and accurate assessment of gene expression changes during Ebola virus infection.

platform, regulation, rna-seq, (15 more...)

arXiv.org Artificial Intelligence

2410.23433

Country:

North America > United States > Virginia > Fairfax County > Fairfax (0.04)
North America > United States > North Carolina > Forsyth County > Winston-Salem (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

Add feedback